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1.  Description  of  Progress 


This  report  summarizes  the  accomplishments  of  the  ISIS  project  during  the  entire  fund¬ 
ing  period.  It  assumes  some  knowledge  regarding  our  overall  effort. 


2.  Summary  of  activities  on  technical  areas  identified  in  the  initial  contract. 

—  During  the  first  two  and  one-half  years  of  DARPA  funding  of  the  ISIS  project  our  effort 
was  focused  on  support  for  resilient  objects:  abstract  data  types  capable  of  toleratine  failures 
Below,  we  list  the  major  tasks  that  we  were  asked  to  address  in  connection  with  this  concept. 
In  each  case,  the  actual  accomplishments  of  the  project  are  discussed,  \ 

l.j  Techniques  for  efficiently  implementing  resilient  ubjectsy  It  is  increasingly  clear  that 
existing  methods  for  producing  fault-tolerant  software  are  inadequate.  Although  several 
groups  claim  to  have  developed  such  technologies,  the  fact  that  so  few  distributed  sys¬ 
tems  actually  exist  argues  that  none  has  yet  achieved  wide  success.  Distributed  applica¬ 
tions  that  do  exist  are  generally  intolerant  of  failures,  and  are  often  surprisingly  non- 
distributed  in  their  internal  architecture.  For  example,  the  UNIX  network  file  system 
(NFS)  maintains  no  distributed  state  information,  and  the  ARPANET  mail  and  bulletin 
board  facilities  are  notoriously  unreliable.  Yet,  few  other  distributed  "application”  pro¬ 
grams  of  any  kind  exist. 


Several  projects  have  attempted  to  address  this  shortcoming.  At  MIT,  the  ARGUS  pro¬ 
ject  developed  a  transactional  language  for  fault-tolerant  distributed  computing.  How¬ 
ever,  this  approach  does  not  (yet)  provide  any  help  if  the  goal  is  to  replicate  data.  The  V 
project,  at  Stanford,  developed  a  system  based  on  inexpensive  remote  procedure  calls  and 
the  grouping  of  programs  providing  a  service  into  process  groups  that  clients  can  access 
with  no  knowledge  of  the  current  group  membership.  However,  V  provides  little  help  in 
tolerating  failures.  ISIS-1,  the  first  version  of  the  ISIS  system,  adopted  an  approach  that 
combines  elements  of  these  two  systems:  a  resilient  object  is  programmed  using  a 
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language  similar  to  ARGUS,  but  compiles  into  a  distributed  program  capable  of  function¬ 
ing  correctly  despite  site  and  program  crashes  and  concurrent  requests  from  application 
programs.  Our  approach  was  uniquely  transparent:  programmers  who  construct  a  resi¬ 
lient  object  describe  it  as  if  it  were  not  distributed  and  failures  never  happen:  replication 
of  data,  failure  handling,  and  recovery  mechanisms  are  all  provided  by  the  ISIS-1  system. 
Similarly,  clients  issue  RPC’s  to  a  resilient  object  as  if  it  were  a  single  instance  of  the 
specified  abstract  datatype  executing  at  some  single  site;  actual  execution  of  the  request 
is  automatically  handled  by  ISIS-1 . 

Procedures  for  improving  resilient  object  performance,  Performance  is  one  of  the  basic 
problems  we  confronted  in  connection  with  our  resilient  object  approach,  and  we 
developed  a  unique  cumurrent  updating  mechanism  for  dealing  with  this  issue.  The 
mechanism  is  based  on  a  new  asynchronous  broadcast  primitive,  the  causal  broadcast, 
which  has  turned  out  to  be  one  of  the  most  important  contributions  of  our  two  year  effort. 
By  combining  this  mechanism  with  several  others,  we  achieved  replicated  update  perfor¬ 
mance  that  was  roughly  as  fast  as  a  single-site  read  request  would  be  in  a  typical  sys¬ 
tem.  This  performance  exceeds  that  which  can  be  obtained  with  any  other  approaches. 
On  the  other  hand,  like  ARGUS,  resilient  objects  suffer  from  some  insurmountable  forms 
of  overhead.  What  we  found  is  that  because  resilient  objects  use  a  transactional  correct¬ 
ness  constraint,  performance  of  such  an  object  can  never  be  as  good  as  for  a  system  that 
can  get  away  with  some  cheaper  correctness  constraint.  Moreover,  because  resilient 
objects  are  discrete  entitied,  and  are  not  compiled  directly  into  their  clients,  the  costs  of 
just  getting  a  request  to  them  can  be  high  -  measured  in  milliseconds  in  many  cases. 
We  return  to  this  issue  below.  To  summarize,  because  of  our  concurrent  update  algo¬ 
rithm,  resilient  object  performance  is  extremely  good  --  far  better  than  we  imagined  could 
be  possible.  However,  performance  could  be  still  better  if  non-transactional  correctness 
constraints  could  be  used  instead  of  the  ARGUS-like  transactional  one  we  adopted,  and  if 
the  mechanisms  could  be  unbundled  from  the  rigid  "abstract  type”  framework  in  which  it 


was  packaged. 


3.  Techniques  for  minimizing  inter-siti •  message  traffic  in  a  system  supporting  resilient 
objects.  In  this  area,  we  developed  a  number  of  techniques  for  asynchronous  broadcasting 
and  piggybacking.  These  are  interesting  because  they  can  boost  performance  consider¬ 
ably  above  the  level  that  can  be  achieved  using  a  "greedy”  point  to  point  RPC  interac¬ 
tion.  The  approach  has  lead  us  to  develop  a  suite  of  broadcast  primitives  that  are 
integrated  into  a  package,  and  to  code  algorithms  that  exploit  the  cheapest  possible  prim¬ 
itive  for  each  type  of  activity  they  undertake.  The  causal  broadcast  mentioned  above  is 
one  of  these;  the  suite  covers  a  full  range  of  broadcast  types.  This  approach  runs  con¬ 
trary  to  the  accepted  wisdom  in  distributed  systems  design,  which  has  been  increasingly 
oriented  towards  support  fcr  efficient  RPC  mechanisms.  As  a  result,  we  are  deeply 
involved  in  efforts  to  develop  new  message  transport  protocols  capable  of  efficiently  sup¬ 


porting  broadcast  interactions  between  sites  in  a  distributed  system. 


4.'  A  prototype  version  of  the  ISIS-1  system  capable  of  supporting  simple  objects.  '  A  prototype 
has  existed  for  more  than  two  years,  and  has  been  demonstrated  to  our"t5ARPA  program 
director  (currently,  Dr.  Dennis  Perry)  A  second  generation  system,  quite  different  from 
"  this  first  system,  but  preserving  a  form  of  upward  compatibility,  is  now  under  develop¬ 


ment.  We  describe  it  further  below. 


5 )  Evaluation  of  the  prototype.  A  performance  evaluation  of  the  ISIS-1  prototype  was  under¬ 

taken,  and  reported  in  [2],  This  evaluation  was  based  on  some  simple  application 
software  for  maintaining  a  distributed  database  of  appointments  (a  calendar)  and  for  a 
simple  process  control  application.  The  applications  performed  well  enough  to  confirm 
that  our  concurrent  update  mechanism  yields  excellent  replicated  data  update  perfor¬ 
mance.  At  the  same  time,  the  absolute  performance  of  the  system  was  not  as  high  as  we 


would  like. 
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Systematic  -study  has  revealed  that  the  overhead  in  question  stems  primarily  from  two 
aspects.  First,  the  system  is  transactional  and  the  associated  concurrency  control  mechanisms 
are  costly.  Secondly,  resilient  objects  exist  independent  of  their  clients.  A  consequence  is  that 
one  cannot  access  such  an  object  without  sending  it  a  message,  and  even  when  the  client  and 
the  object  are  coresident  at  a  single  site,  this  imposes  substantial  costs.  Moving  from  UNIX  to 
a  system  like  V  might  reduce  these  costs.  Overall,  however,  a  more  basic  system  restructuring 
seems  to  be  indicated. 

3.  The  distributed  systems  tool  kit 

Early  this  year  we  began  work  on  a  new  system  that  we  expect  to  complete  this  summer, 
and  which  is  replacing  the  ISIS-1  prototype  in  our  experimental  work  here  at  Cornell.  We  call 
the  new  system  ISIS-2.  ISIS-2  retains  the  mechanisms  that  worked  best  in  ISIS-1,  while  strip¬ 
ping  from  it  the  aspects  that  proved  to  be  bottlenecks.  The  overall  form  of  this  system  is  as 
follows.  At  the  lowest  level,  it  consists  of  an  implementation  of  a  suite  of  broadcast  communi¬ 
cation  protocols  that  support  virtually  synchronous  process  groups.  This  concept,  which  we  dis¬ 
cuss  in  several  papers,  represents  a  breakthrough  in  our  approach  to  distributed  systems  con¬ 
struction  Like  V,  it  is  an  approach  oriented  towards  support  for  process  groups,  but  unlike  V, 
issues  relating  to  tolerance  of  failures  and  supporting  high  levels  of  concurrency  are  addressed 
are  part  of  the  process  group  mechanism.  For  example,  members  of  a  virtually  synchronous 
process  group  can  migrate  from  site  to  site,  or  can  exchange  responsibility  for  processing  some 
task,  without  any  risk  of  a  client  interacting  with  the  group  before  the  change  has  completed. 
Because  all  group  members  receive  a  given  message  if  any  does,  and  all  receive  it  in  the  same 
"state”,  the  approach  creates  a  remarkably  simple  environment  within  which  to  develop  distri¬ 
buted  algorithms.  Moreover,  performance  can  be  extremely  high  —  in  comparison  with  ISIS-1, 
this  is  a  real  "RISC”  approach  to  distributed  computing.  At  the  same  time,  it  is  an  environ¬ 
ment  fully  capable  of  supporting  the  mechanisms  used  in  the  ISIS-1  prototype. 
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Few  programmers  have  the  sophistication  to  program  at  the  level  of  asynchronous  distri¬ 
buted  broadcasts,  even  in  an  environment  providing  virtual  synchrony  (the  appearance  that 
one  event  happens  at  a  time).  Accordingly,  we  are  packaging  our  primitives  into  a  set  of  tools 
that  will  form  a  library  of  mechanisms  covering  all  the  types  of  actions  that  distributed  pro¬ 
grams  need  most  often.  A  paper  that  we  will  present  at  the  upcoming  SOSP  conference  in 
Austin  covers  this  approach,  which  has  generated  widespread  interest  and  enthusiasm  among 
researchers  in  the  field  who  have  learned  of  it.  An  implementation  of  the  primitives  was  com¬ 
pleted  this  month,  and  the  first  version  of  the  toolkit  is  just  starting  to  limp  along.  By  the  end 
of  this  summer,  we  expect  to  have  a  fully  operational  toolkit  with  some  non-trivial  application 
software  running  on  it. 

4.  Asynchronous  bulletin  boards 

Distributed  artificial  intelligence  programs  are  often  constructed  using  a  bulletin  board 
paradigm  whereby  information  is  shared  through  one  or  more  common  bulletin  boards  on 
which  processes  can  post  and  read  information  at  will.  A  bulletin  board  is  a  simple  and  highly 
asynchronous  form  of  shared  memory,  and  it  occurred  to  us  that  these  might  provide  a  cheaper 
mechanism  than  resilient  objects  accomplishing  much  the  same  goal.  A  paper  describing  our 
approach  has  been  written,  and  an  implementation  of  the  mechanism  is  underway  as  one  of 
the  "tools”  in  the  distributed  systems  toolkit  reported  above 

5.  The  importance  of  compatibility 

We  used  to  believe  that  all  distributed  applications  could  somehow  be  twisted  to  fit  into 
the  resilient  object  approach  -  much  as  the  ARGUS  project  twists  all  applications  into  nested 
actions,  V  into  process  groups  interacting  via  RPC,  and  LOCUS  into  replicated  files.  UNIX, 
which  simply  provides  communication  paths,  is  outstanding  at  providing  high  bandwidth  data 
streams  and  even  network  access  to  file  systems,  but  much  weaker  at  providing  fancier 
mechanisms  on  top  of  these  streams.  We  no  longer  believe  that  any  one  approach  -  even  our 
own  -  can  addresB  the  needs  of  every  possible  class  of  system.  A  more  realistic  approach  is  to 
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concede  that  each  of  these  techniques  is  nearly  ideal  for  some  class  of  applications,  but  that 
none  is  ideal  for  all  classes.  Thus,  the  ISIS-2  system  may  well  provide  a  uniquely  high  quality 
of  support  for  maintaining  consistent  replicated  data  structures,  migrating  tasks  from  site  to 
site,  and  recovering  from  failures.  Yet,  ISIS-2  will  inevitably  have  to  coexist  with  high 
bandwidth  stream  mechanisms,  transactional  database  mechanisms,  and  network  file  systems. 
This  argues  that  the  toolkit  routines  must  be  designed  in  such  a  manner  as  to  be  compatible 
with  one  another  -  it  should  not  ever  be  the  case  that  the  use  of  one  routine  renders  another 
incorrect,  or  that  the  use  of  some  other  mechanism  (like  a  high  bandwidth  communication 
channel)  invalidates  an  assumption  made  by  some  other  part  of  the  system.  ISIS-2  is  being 
designed  with  this  in  mind.  Over  a  period  of  time  we  will  provide  interfaces  to  a  wide  range  of 
UNIX  mechanisms,  transactional  mechanisms,  etc.  in  such  a  manner  as  to  guarantee  that  the 
correctness  of  all  of  these  mechanisms  is  preserved  regardless  of  how  they  are  combined. 

6.  A  serious  application 

At  the  present  stage  of  our  research,  serious  applications  are  needed  to  push  ISIS-2  to  its 
limits  We  are  coIlal''”-'<fing  with  Ke;*h  Marzullo  en  development  of  several  kinds  of  such 
applications.  These  will  be  in  the  area  of  distributed  file  systems  (we  are  constructing  an 
"RFS”  program  that  will  provide  file  replication  and  fault-tolerance  using  normal  UNIX  NFS 
systems  as  its  components),  distributed  process  control  (we  are  investigating  the  adaptation  of 
some  of  Keith’s  work  on  clock  synchronization  to  realtime  issues  within  ISIS-2)  and  program¬ 
ming  in  the  large.  In  light  of  this,  we  expect  ISIS-2  to  be  supporting  a  moderate  user  com¬ 
munity  on  a  regular  basis  by  the  end  of  1987. 

7.  Network  partitioning 

A  final  area  in  which  we  have  made  recent  progress  concerns  network  partitioning.  The 
current  approach  to  ISIS-2  is  intolerant  of  partitioning  -  partitioning  failures  can  cause  it  to 
"hang”  until  the  partition  resolves  itself.  In  a  local  area  network,  this  will  not  be  much  of  a 
problem  because  such  networks  simply  do  not  partition  very  often.  Howe»er  ;n  larger 
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networks,  clusters  of  ISIS-2  sites  may  have  to  be  interconnected  over  links  that  do  partition, 
and  blocking  in  this  case  is  unsatisfactory.  Most  existing  work  on  partitioning  is  onented 
towards  databases,  which  assume  a  transactional  correctness  constraint.  As  noted  above, 
ISIS-2  no  longer  assumes  this  about  application  software  Thus  new  techniques  for  dealing 
with  partitioning  are  needed 

Working  with  Ajei  Gopal,  a  new  graduate  student  member  of  the  project,  some  hope  for 
non-blocking  communication  across  partitionable  links  has  now  emerged  Our  approach 
allows  for  a  special  class  of  long  running  protocols,  which  can  be  derived  in  a  mechanical 
manner  from  conventional  protocols  such  as  the  broadcast  protocols  used  within  ISIS-2 
Rather  than  run  the  usual  ISIS-2  protocols  directly  across  links  that  can  partition,  these  spe¬ 
cial  protocols  would  be  used  in  a  hierarchical  fashion.  Good  performance,  rapid  termination, 
and  tolerance  of  partitioning  results.  We  expect  to  complete  a  paper  on  this  new  work  during 
the  summer  or  fall  of  this  year.  Moreover,  an  implementation  will  be  undertaken  as  part  of 
our  new  system 

8.  Budget 

A  budgetary  summary  for  the  entire  period  of  support  appears  below.  All  expenditures 
are  in  line  with  projections  under  our  original  budget. 
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Year  1 

Year  II 

Total 

Actual 

Salaries  and  Wages 

Budget 

Budget 

Funding 

Expenses 

Principal  Investigator 

Sj'-mer 

$16,600 

$17,600 

$34,400 

$34,044 

Academic  year 

5,375 

5,840 

11,215 

$20,150 

Professionals 

Research  Associate 
Programmer 

12,500 

0 

33,000 

45,500 

$49,352 

$9,000 

Secretary 

4,200 

4,600 

8,800 

$10,106 

Graduate  Students 

Academic  year 

54,810 

50,860 

105,670 

$129,731 

Summer 

16,800 

18.000 

34.800 

Ugrad 

4,115 

4,420 

8,535 

Total  Salaries  and  Wages 

114,400 

134,520 

248,920 

252,383 

Employee  Beneftls 

Summer 

1,660 

1,780 

3,440 

$3,404 

Academic  year 

6,158 

12,272 

18,430 

24,527 

General  Expenses 

Travel  domestic 

6,000 

7,000 

13,000 

$14,044 

Miscellaneous 

Supplies 

3,600 

4,000 

7,600 

$7,792 

Publications 

2.400 

2,400 

4,800 

$2,857 

Computer  supplies 

2,000 

2,000 

4,000 

$643 

Computer  maintenance 
Lecturer  fees 

7,200 

7,500 

14,700 

$4,035 

$834 

Equipment 

68,000 

68,000 

$76,360 

Indirect  cost 

72,876 

94,310 

167,186 

163,197 

Total 

$284,294 

$265,782 

$550,076 

550,076 

NOTE:  Anv  deviation  from  the  original  budget  received  prior  approval 
from  Dennis  Perry. 
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